feat(query): add field existence, regex, +/- prefix, boost, and ? wildcard query support by Copilot · Pull Request #39 · mchurichi/peek

Copilot · 2026-03-03T01:14:13Z

Extends the Lucene-style query parser with the missing syntax features requested in the issue: field:* existence, field:/regex/, required/prohibited prefixes, boost acceptance, and ? single-character wildcards. Also fixes bare quoted phrases ("connection refused") which previously searched with the quotes included.

What

Go query engine (pkg/query/lucene.go)

ExistenceFilter — field:* matches entries where the field is present
RegexFilter — field:/regex/ applies a compiled regexp.Regexp; extractRegex() handles patterns containing () that the token reader would otherwise cut short
+ prefix consumed as a no-op (default AND); - prefix wraps the next filter in NotFilter
stripBoost() strips trailing ^n from tokens — accepted for syntax compatibility, ignored for filtering
? wildcard in WildcardFilter: detection extended to ContainsAny("*?"), converted to . in the regex
Bare quoted phrases strip the quotes before KeywordFilter (previously searched with literal " characters)

UI syntax highlighter (pkg/server/index.html)

New --peek-purple CSS var + .hl-regex class for regex literals
Word token reader gains inRegex state so ( / ) inside /regex/ don't terminate the token
+/- prefixes emitted as hl-op in both field-scoped and bare-keyword positions
tokenizeValue treats ? as a wildcard char alongside *

Tests

Unit tests for all new filter types and parser behaviors
e2e/lucene-query.spec.mjs — 8 Playwright tests: field existence, regex with alternation, FTS keyword, quoted phrase, +/- combined, wildcard, and UI highlighting

Why

The parser silently dropped several Lucene query constructs that users expect to work (field:*, field:/regex/, +term, -term, term^n), and bare quoted phrases matched nothing because the surrounding " were included in the keyword search string.

Related Issue

Closes #9

Testing Done

All Go unit tests pass (go test ./... -race -count=1)
All 8 new E2E tests pass; existing search.spec.mjs unaffected
go vet ./... clean; CodeQL: 0 alerts

Original prompt

This section details on the original issue you should resolve

<issue_title>Add indexed query support for field existence, regex, and full-text search (FTS) while staying local-first</issue_title>
<issue_description>## Summary

Implement Lucene-style query string support for Peek so users can query logs with syntax that is as close to Lucene QueryParser as possible, focusing on:

Full-text search (FTS) on message and selected fields (analyzed)
Field existence queries (field:*)
Regex queries (field:/.../)
Wildcards (*, ?), phrases ("..."), required/prohibited (+, -), boosting (^), and boolean logic

Maintain local-first, single-binary distribution and the no-build-step UI model.

Motivation

Peek currently supports a small Lucene-like subset evaluated via scanning (with time-range key seeking). Users want Lucene-like expressiveness, specifically:

field:* existence
field:/regex/
real full-text search behavior (analysis/tokenization), not substring contains

This needs to work both for querying historical logs and for realtime filtering in the UI.

Goals

Accept Lucene-style query string syntax in the UI and API, staying as close to Lucene QueryParser as practical.
Add FTS with an analyzer-driven inverted index (default field behavior like Lucene).
Add field existence query semantics compatible with Lucene (field:*).
Add regex query semantics compatible with Lucene query string (field:/.../).
Keep single binary, local-only, embedded UI in pkg/server/index.html, no new frontend dependencies, immutable VanJS updates.
Add Playwright E2E tests for the new query features.

Non-goals (for this issue)

Remote collectors or multi-user deployments
Distributed search or external services
Full Solr/Elasticsearch feature parity (faceting, aggregations, scoring explanations, etc.)
Perfect Lucene scoring parity (ranking differences are acceptable; correctness of filtering is the priority)

Proposed approach (recommended)

Use an embedded Go search index to avoid implementing a full Lucene parser + inverted index from scratch.

Recommendation:

Use Bleve's query string query support as the parsing and execution engine for Lucene-like syntax.
Keep BadgerDB as the source of truth for stored log entries.

Rationale:

Query string syntax supports phrases, field scoping, regex, required/excluded operators, and boosting.
Bleve supports query types we need (regexp, wildcard, fuzzy, numeric/date ranges, query string).
Keeps local-first and single-binary (just adds a Go dependency and an on-disk index directory).

User-visible query syntax (Lucene-style)

Default field behavior (FTS)

Unfielded terms query the default field (configurable), recommended default: message (and optionally a composite field).
- timeout refused
- "connection refused"

Field scoping

service:api-gateway
level:ERROR

Field existence (Lucene semantics)

request_id:*
user_id:*

Semantics: field is present and has at least one term indexed.

Regex (Lucene query string style)

service:/^api-(gateway|edge)$/
user_id:/^usr-[0-9]{4}$/

Semantics: regex applies to indexed terms for that field.
Important note:

For keyword fields (not analyzed), the term is the full field value, so regex behaves like "regex over the full value".
For analyzed fields (like message), regex is term-level, not substring-over-full-text, consistent with Lucene behavior.

Wildcards

service:api*
request_id:req-?????? (if ? is supported)
message:*timeout* (term-level wildcard implications apply)

Boolean and required/prohibited clauses

level:ERROR AND service:api
+level:ERROR -service:auth

Boosting

error^2 timeout

Architecture changes

Storage remains unchanged

BadgerDB key format remains: log:{timestamp_nano}:{id}
LogEntry JSON stays as-is.

Add embedded index

Introduce an index directory (default under Peek data dir):

~/.peek/index (or ${db_path}/index)

Add configuration:

[search] enabled = true|false (default false initially)
[search] index_path = "~/.peek/index"
[search] default_field = "message"
[search] include_in_all = ["message", "raw"] (optional)
[search] field_mapping_mode = "dynamic|strict"

CLI flags:

--search (enable embedded index)
--search-index-path
--search-default-field

Index document model

Index one document per log entry with a stable doc id:

docID = "{timestamp_nano}:{id}"
Badger key can be derived: log:{timestamp_nano}:{id}

Indexed fields (suggested):

timestamp (datetime)
level (keyword)
message (text, analyzed)
raw (text or keyword, optional)
fields.* (dynamic)
- strings: keyword by default
- numbers: numeric
- booleans: boolean
- optional: allow marking specific fields as analyzed text via config (eg fields.stacktrace)

Query execution path

When search index is enabled:

`/query...

Fixes Add indexed query support for field existence, regex, and full-text search (FTS) while staying local-first #38

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…uery support Co-authored-by: mchurichi <633182+mchurichi@users.noreply.github.com>

…review Co-authored-by: mchurichi <633182+mchurichi@users.noreply.github.com>

mchurichi · 2026-03-03T06:41:59Z

@codex review this

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf09af7297

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T06:46:37Z

pkg/query/lucene.go

+	if len(value) >= 2 && value[len(value)-1] == '/' {
+		return value[1 : len(value)-1]


Continue parsing regex after escaped trailing slash

extractRegex currently treats any token ending in / as a complete regex literal, but readToken can stop early at ( or ). For a valid query like path:/foo\/(bar|baz)/, the partial token seen here is "/foo\\/"; this branch returns foo\\ and regexp.Compile fails, so valid regex queries are rejected whenever an escaped slash appears before a parenthesized part.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-03T06:46:37Z

pkg/server/index.html

+                        if (c === ':' && j + 1 < text.length && text[j + 1] === '/') {
+                            inRegex = true


Limit regex-mode detection to field values that start with '/'

The highlighter switches to regex mode on any :/ sequence, even when it appears later inside a normal value. Queries such as url:https://example.com therefore get split as if https:/ started a regex literal, producing incorrect tokenization/highlighting for common URL filters. This should only trigger for the field/value delimiter followed immediately by /.

Useful? React with 👍 / 👎.

Initial plan

e99fb7c

Copilot AI assigned Copilot and mchurichi Mar 3, 2026

Copilot started work on behalf of mchurichi March 3, 2026 01:14 View session

Copilot AI and others added 2 commits March 3, 2026 01:24

feat: add field existence, regex, +/- prefix, boost, and ? wildcard q…

7f7c7c9

…uery support Co-authored-by: mchurichi <633182+mchurichi@users.noreply.github.com>

refactor: clarify variable name and add regex tokenizer comments per …

cf09af7

…review Co-authored-by: mchurichi <633182+mchurichi@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add indexed query support for field existence and regex~~ feat(query): add field existence, regex, +/- prefix, boost, and ? wildcard query support Mar 3, 2026

Copilot finished work on behalf of mchurichi March 3, 2026 01:31

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(query): add field existence, regex, +/- prefix, boost, and ? wildcard query support#39

feat(query): add field existence, regex, +/- prefix, boost, and ? wildcard query support#39
Copilot wants to merge 3 commits intomainfrom
copilot/add-lucene-query-support

Copilot AI commented Mar 3, 2026 •

edited

Loading

Uh oh!

mchurichi commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if len(value) >= 2 && value[len(value)-1] == '/' {
		return value[1 : len(value)-1]

		if (c === ':' && j + 1 < text.length && text[j + 1] === '/') {
		inRegex = true

Conversation

Copilot AI commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Related Issue

Testing Done

Motivation

Goals

Non-goals (for this issue)

Proposed approach (recommended)

User-visible query syntax (Lucene-style)

Default field behavior (FTS)

Field scoping

Field existence (Lucene semantics)

Regex (Lucene query string style)

Wildcards

Boolean and required/prohibited clauses

Boosting

Architecture changes

Storage remains unchanged

Add embedded index

Index document model

Query execution path

Uh oh!

mchurichi commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 3, 2026 •

edited

Loading